-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Allow for zero-width fixed size lists #18940
Conversation
393c918
to
01f6abd
Compare
This PR properly implements `polars.Array(..., 0)`. This does make a few fundamental changes to how `FixedSizeListArray` works internally. Mainly, it now keeps a `length` field that actually determines the length instead of relying the `values.len() / size`. Fixes pola-rs#18921, pola-rs#16878.
01f6abd
to
0fa59f9
Compare
} else { | ||
unreachable!("`GrowableFixedSizeList` expects `DataType::FixedSizeList`") | ||
}; | ||
let size = arrays[0].size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: we can already get this size from the array, and the array has the invariant that the dtype must be FSL
let offsets = list.offsets().buffer().iter(); | ||
let expected = | ||
(list.offsets().first().to_usize()..list.len()).map(|ix| O::from_as_usize(ix * size)); | ||
|
||
match offsets | ||
.zip(expected) | ||
.find(|(actual, expected)| *actual != expected) | ||
{ | ||
Some(_) => polars_bail!(ComputeError: | ||
"not all elements have the specified width {size}" | ||
), | ||
None => { | ||
let sliced_values = list.values().sliced( | ||
list.offsets().first().to_usize(), | ||
list.offsets().range().to_usize(), | ||
); | ||
cast(sliced_values.as_ref(), inner.dtype(), options)? | ||
}, | ||
let start_offset = list.offsets().first().to_usize(); | ||
let offsets = list.offsets().buffer(); | ||
|
||
let mut is_valid = true; | ||
for (i, offset) in offsets.iter().enumerate() { | ||
is_valid &= offset.to_usize() == start_offset + i * size; | ||
} | ||
|
||
polars_ensure!(is_valid, ComputeError: "not all elements have the specified width {size}"); | ||
|
||
let sliced_values = list | ||
.values() | ||
.sliced(start_offset, list.offsets().range().to_usize()); | ||
cast(sliced_values.as_ref(), inner.dtype(), options)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: I feel like this is much clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let is_valid = offsets.iter().enumerate().all(|(i, offset)| offset.to_usize() == start_offset + i * size);
pub fn to_fixed_size_list(self, size: usize, is_nullable: bool) -> ArrowDataType { | ||
ArrowDataType::FixedSizeList( | ||
Box::new(Field::new( | ||
PlSmallStr::from_static("item"), | ||
self, | ||
is_nullable, | ||
)), | ||
size, | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added so that we don't constantly have to call .to_physical
in reshape
.
@@ -41,6 +41,7 @@ pub fn read_fixed_size_list<R: Read + Seek>( | |||
)?; | |||
|
|||
let (field, size) = FixedSizeListArray::get_child_and_size(&dtype); | |||
polars_ensure!(size > 0, nyi = "Cannot read zero sized arrays from IPC"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this should be implemented, so I just left it like this for now.
if *self_width == 0 { | ||
return Bitmap::new_with_value(false, self.len()); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[] != [] = False
// # Try to Chunked Arrays | ||
pub fn try_bool(&self) -> Option<&BooleanChunked> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: this allows us to avoid instantiating the errors.
ArrowDataType::FixedSizeList(field, size) => { | ||
feature_gated!("dtype-array", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by: This is already feature-gated, no real sense in feature gating it again
@@ -924,7 +923,7 @@ def test_parquet_array_dtype_nulls() -> None: | |||
[[]], | |||
[[None]], | |||
[[[None], None]], | |||
[[[None], []]], | |||
[[[None], [None]]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a clear error
@pytest.mark.parametrize("shape", [(0, 1), (1, -1), (-1, 1)]) | ||
def test_reshape_empty_invalid_2d(shape: tuple[int, ...]) -> None: | ||
s = pl.Series("a", [], dtype=pl.Int64) | ||
with pytest.raises( | ||
InvalidOperationError, | ||
match=re.escape( | ||
f"cannot reshape empty array into shape {display_shape(shape)}" | ||
), | ||
): | ||
s.reshape(shape) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the cases now fail
Hi, just FYI you need to repeat the "Fixes" for each issue number to automatically close multiple issues. Currently #16878 is not closed ;) |
This PR properly implements
polars.Array(..., 0)
. This does make a fewfundamental changes to how
FixedSizeListArray
works internally. Mainly, itnow keeps a
length
field that actually determines the length instead ofrelying on the
values.len() / size
.Fixes #18921, #16878.